149 research outputs found
GraphR: Accelerating Graph Processing Using ReRAM
This paper presents GRAPHR, the first ReRAM-based graph processing
accelerator. GRAPHR follows the principle of near-data processing and explores
the opportunity of performing massive parallel analog operations with low
hardware and energy cost. The analog computation is suit- able for graph
processing because: 1) The algorithms are iterative and could inherently
tolerate the imprecision; 2) Both probability calculation (e.g., PageRank and
Collaborative Filtering) and typical graph algorithms involving integers (e.g.,
BFS/SSSP) are resilient to errors. The key insight of GRAPHR is that if a
vertex program of a graph algorithm can be expressed in sparse matrix vector
multiplication (SpMV), it can be efficiently performed by ReRAM crossbar. We
show that this assumption is generally true for a large set of graph
algorithms. GRAPHR is a novel accelerator architecture consisting of two
components: memory ReRAM and graph engine (GE). The core graph computations are
performed in sparse matrix format in GEs (ReRAM crossbars). The
vector/matrix-based graph computation is not new, but ReRAM offers the unique
opportunity to realize the massive parallelism with unprecedented energy
efficiency and low hardware cost. With small subgraphs processed by GEs, the
gain of performing parallel operations overshadows the wastes due to sparsity.
The experiment results show that GRAPHR achieves a 16.01x (up to 132.67x)
speedup and a 33.82x energy saving on geometric mean compared to a CPU baseline
system. Com- pared to GPU, GRAPHR achieves 1.69x to 2.19x speedup and consumes
4.77x to 8.91x less energy. GRAPHR gains a speedup of 1.16x to 4.12x, and is
3.67x to 10.96x more energy efficiency compared to PIM-based architecture.Comment: Accepted to HPCA 201
Surface layer evolution in glow discharge optical emission spectroscopy.
Glow discharge optical emission spectroscopy (GDOES) is a modem analytical technique for the analysis of the chemical composition of bulk materials and the depth profiling of multi-layer structures. Most research in the use of GDOES has concentrated on developing accurate methodologies for quantitative analysis and depth profiling. However, this thesis presents a study on various aspects of surface layer evolution under argon ion etching in GDOES. The GDOES technique relies on the ion bombardment of sample surfaces which removes material from the surface, layer by layer, on the atomic scale. During the surface layer evolution, the ion bombardment causes different surface micro-textures and preferential sputtering in individual crystallites, which can cause degradation of depth resolution in GDOES depth profiling. Experimental results using pure iron specimens in this study show a correlation between textures induced by GDOES sputtering and the sputtering rate, and a difference in the sputtering rate for crystallites with different crystal orientations. In studying ion bombardment by GDOES in semiconductors, a novel pitting morphology on the surface of a carbon-coated silicon wafer was observed and characterised in detail. This may have a potential application in the fabrication of micro-lens arrays. The generation and development of the pits were investigated, which are believed to be dependent upon the different sputtering rates between the film and the substrate. Geometric features of the pits were obtained using atomic force microscope (AFM) and the sphere-like surface of the pit was confirmed. The experimental work in this study also shows that the Grimm source in GDOES is a powerful etching tool. Eroded surfaces of metal specimens with little damage to the crystallites and phase structures were obtained by GDOES etching. The method was found to be an ideal process for specimen preparation for electron back-scattered diffraction (EBSD). The GDOES-etched surface of single crystal copper showed that the damaged layer formed by mechanical polishing using 6 micron diamond paste was about 1-2 pm and was removed after only a few tens of seconds of GDOES etching. GDOES etching was also applied to an investigation of internal oxides in carburised steels. The eroded surfaces provided plan views of the morphologies of internal oxides of carburised steels by scanning electron microscopy (SEM) images. Results of energy dispersive spectrometer (EDS)/SEM elemental mappings of different layers of the steels were in good agreement with GDOES depth profiles, which revealed that the elements Cr, Mn and Si were involved in the oxides. The last section of the thesis is about hydrogen detection in GDOES. The study includes a detailed analysis of: hydrogen contamination in GDOES, the hydrogen detection status of GDOES, the sample matrix effects on hydrogen detection and hydrogen effects on elemental concentrations in GDOES measurements. The experiments have confirmed that water vapour is the main source of the hydrogen contamination. When the GDOES system has stabilised, GDOES could be employed to differentiate specimens containing different concentrations of hydrogen. The experiments also showed that different hydrogen intensities could have resulted from different matrices even when the specimens were believed to contain no hydrogen. A possible explanation could be that variations of the y-electron ejection from different matrices and different sputtered atoms in the glow discharge, which altered the plasma and the energy distribution in the glow region, resulted in the variation of the excitation of the hydrogen atoms in the source. However, there are still some results in the matrix effects which could not be explained. The experiments concerning the consequence of hydrogen effects on apparent elemental concentrations in GDOES measurements were also undertaken using two steel standards. The results indicated that the hydrogen in the source has a negative effect on the signal from most of the metal elements in the specimens, and a positive effect on the non-metal and semiconductor elements
EasyHeC: Accurate and Automatic Hand-eye Calibration via Differentiable Rendering and Space Exploration
Hand-eye calibration is a critical task in robotics, as it directly affects
the efficacy of critical operations such as manipulation and grasping.
Traditional methods for achieving this objective necessitate the careful design
of joint poses and the use of specialized calibration markers, while most
recent learning-based approaches using solely pose regression are limited in
their abilities to diagnose inaccuracies. In this work, we introduce a new
approach to hand-eye calibration called EasyHeC, which is markerless,
white-box, and offers comprehensive coverage of positioning accuracy across the
entire robot configuration space. We introduce two key technologies:
differentiable rendering-based camera pose optimization and consistency-based
joint space exploration, which enables accurate end-to-end optimization of the
calibration process and eliminates the need for the laborious manual design of
robot joint poses. Our evaluation demonstrates superior performance in
synthetic and real-world datasets, enhancing downstream manipulation tasks by
providing precise camera poses for locating and interacting with objects. The
code is available at the project page: https://ootts.github.io/easyhec.Comment: Project page: https://ootts.github.io/easyhe
Low-Cost Floating-Point Processing in ReRAM for Scientific Computing
We propose ReFloat, a principled approach for low-cost floating-point
processing in ReRAM. The exponent offsets based on a base are stored by a
flexible and fine-grained floating-point number representation. The key
motivation is that, while the number of exponent bits must be reduced due to
the exponential relation to the computation latency and hardware cost, the
convergence still requires sufficient accuracy for exponents. Our design
reconciles the conflicting goals by storing the exponent offsets from a common
base among matrix values in a block, which is the granularity of computation in
ReRAM. Due to the value locality, the differences among the exponents in a
block are small, thus the offsets require much less number of bits to represent
exponents. In essence, ReFloat enables the principled local fine-tuning of
floating-point representation. Based on the idea, we define a flexible ReFloat
format that specifies matrix block size, and the number of bits for exponent
and fraction. To determine the base for each block, we propose an optimization
method that minimizes the difference between the exponents of the original
matrix block and the converted block. We develop the conversion scheme from
default double-precision floating-point format to ReFloat format, the
computation procedure, and the low-cost floating-point processing architecture
in ReRAM
Back Attention Knowledge Transfer for Low-Resource Named Entity Recognition
In recent years, great success has been achieved in the field of natural
language processing (NLP), thanks in part to the considerable amount of
annotated resources. For named entity recognition (NER), most languages do not
have such an abundance of labeled data as English, so the performances of those
languages are relatively lower. To improve the performance, we propose a
general approach called Back Attention Network (BAN). BAN uses a translation
system to translate other language sentences into English and then applies a
new mechanism named back attention knowledge transfer to obtain task-specific
information from pre-trained high-resource languages NER model. This strategy
can transfer high-layer features of well-trained model and enrich the semantic
representations of the original language. Experiments on three different
language datasets indicate that the proposed approach outperforms other
state-of-the-art methods
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
With the rise of artificial intelligence in recent years, Deep Neural
Networks (DNNs) have been widely used in many domains. To achieve high
performance and energy efficiency, hardware acceleration (especially inference)
of DNNs is intensively studied both in academia and industry. However, we still
face two challenges: large DNN models and datasets, which incur frequent
off-chip memory accesses; and the training of DNNs, which is not well-explored
in recent accelerator designs. To truly provide high throughput and energy
efficient acceleration for the training of deep and large models, we inevitably
need to use multiple accelerators to explore the coarse-grain parallelism,
compared to the fine-grain parallelism inside a layer considered in most of the
existing architectures. It poses the key research question to seek the best
organization of computation and dataflow among accelerators. In this paper, we
propose a solution HyPar to determine layer-wise parallelism for deep neural
network training with an array of DNN accelerators. HyPar partitions the
feature map tensors (input and output), the kernel tensors, the gradient
tensors, and the error tensors for the DNN accelerators. A partition
constitutes the choice of parallelism for weighted layers. The optimization
target is to search a partition that minimizes the total communication during
training a complete DNN. To solve this problem, we propose a communication
model to explain the source and amount of communications. Then, we use a
hierarchical layer-wise dynamic programming method to search for the partition
for each layer.Comment: To appear in the 2019 25th International Symposium on
High-Performance Computer Architecture (HPCA 2019
- …